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ABSTRACT 


A method is presented for computing initial vectors to be used 
in conjunction with a numerical optimization procedure for minimizing 
the probability of misclassification. The method is similar to that 
presented in [6], Preliminary numerical results of both procedures are 
presented. 



OBTAINING INITIAL VECTORS FOR MINIMIZING 
THE PROBABILITY OF MISCLASSIFICATION 

L. F. Guseman, Jr. and Bruce P. Marion 


I, Introduction 


Consider a set of m distinct populations 11^, JI^* • • • 

positive a priori probabilities a^, and n-dimensional multivariate 

T n 

normal conditional density functions defined for x = (x^,., .,x^) e R by 
p^(x) = (2 tt) ^^^exp[- ] , 1 = 1, 


The 


parameters and 2^ are assumed known with 2'^ positive definite and 


symmetric. If B is a nonzero 1 x n vector then the populations have 
transformed univariate normal conditional density functions defined for 


y = Bx e R by 


P^(y.B) « (2-rr)"^^^(BZ^B^)‘^^^exp 


(y-Bu^) 


2 -1 


^ i ” Ij 2 ... ^m. 


Employing a Bayes optimal (maximum likelihood) classification procedure, 
the probability of misclassifying a transformed observation y = Bx e R^ as a 
function of B is given, [1], [3], by 


g(B) “ 1 - / max a.p. (y,B)dy 
J^l l<l<m 


The resulting optimization problem can then be stated as follows 
(see [3]) : 



2 . 


Determine a 1 x n vector B of norm one such that 
g(B) = min g(C), 

||c||=i 

A solution B to the above minimization problem cannot, in general, be 
obtained in closed form, and the use of some numerical optimization 
procedure is necessary* Any such optimization algorithm requires an Initial 
vector B^, In Section 2 we present a procedure for computing an initial 
vector. The procedure is similar to the procedure presented in [6]. Both 
procedures, produce a by solving a related fixed point problem which 
results when one assumes that 

2, * » . , , « E * S . 

12m 

The fixed point problem is solved Iteratively and also requires an Initial 
guess C^. Preliminary numerical results for various choices of 2 and 
are presented for both procedures. 



2* A Method . For Determining Initial Vectors 

Let B be a nonzero 1 x n vector, and for i j , let g^^(B) denote 
the pairwise probability of misclassiflcation for and ; that Is, 

Sij(B) «= y' min {a^P^(y,B), a^P^(y,B)} dy , 


Then, it is well-known [2 ] that 

(B) 


m-1 m 

g(B) ■< I I Hi 

i=l j=i+l 


nh-1 m 


lu—x m i* 

II I ®in{a P (y,B),a P (y,B))dy 
1«1 j=i+l ''j^l ’ 


m-1 m 


ill f Pi(y,B)p-(y,B)}^^^ dy 

1=1 j=i+l ^1 ^ ^ ^ J 


^ I I /a a f {p*Cy,B)p (y,B)}^^^ dy 
i=l j=i+l ^ ^ 


If i 9 ^ j , and we let 


fij(B) = y* {p^<y,B)pj(y,B)}^^^ dy , 

then g(B) < f(B) where f(B) is given by 
m-1 m 

f(B) = I I Va,'a f (B) . 

i=l j=i+l ^ ^ 



4 . 


For the purpose . of obtaining a starting vector Bq we attempt to find a 
minimum of f subject to the condition that , i = In 

this case, the expression for I j » is given, [5 by 

fy<B) = i (Bu^-BUj)'^(BEB'’^)”^(By^-Bu^). 

The Gateaux differential, 6f(B;C), of f at nonzero B in the direction of 
a i X n vector C is given by 

m-1 m 


<5f(B;C) = I I A.“a7 5f..(B;C) , 
i=*l j=i+l ^ ^ 


where 




- (C(ii.-u.)B(y.-u.) 2 

I A-1 - i - J (B(VU.).) 

^ ( BZB (BDB^)^ ^ ^ 


If B is a nonzero 1 x n vector which minimizes f, then B satisfies the 
vector equation 


M ^ 

3B 


6f(B ; C^) 


6f(B ; C^) 


where , 1 < j < n, is the 1 x n vector with a one in the j slot and 
zeros elsewhere. Letting 6^^ ■- ~ resulting expression for 


is given ^ [ 5 ]^ by 


-- m-1 m 

(*) |b = I I I — ^ ‘Sij - 

1=1 j=i+l ^ ^ ( BZB^ 


(B6.,)^L 
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Since f(tB) * f(B) for t ^ 0, and since- f is a continuous function of B, 
the problem reduces to minimizing f over the set of 1 x n vectors of norm 
one. 


Theorem 1, Let B be a 1 x n vector of norm one which minimizes f. 

Q 

is a fixed point of 


Then 


H(B) = 

I |l(b) 


where 


m-1 m 


L(B) = I I 

i-1 j«i+l ^ ^ 


9£ 

Proof: If B minimizes f, then -frzr- “ 0 , 

— — — O OD 

o 


Then from (*) 


m-1 m B 6 . , m-1 m 

oJJ. 


T 

EB B 
o o 


I I '^ij “ 5! I ■ - ?"2 

i=l i«l+l ^ ^ B EB i=l i=i+l ^ ^ (B EB ) ° 

o o 


(B EB ) 
o o 


Letting 


m-1 m 

L(B„) = I I ^ ’ 

i*l j«i+l ^ ^ o 13 


we have 


B EB L(B ) 
o o o 


EB B L(B ) 
0 0 o 
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T T T 

Since SB B has rank one^and SB is the eigenvector of SB B 
o o o ^ o o 

T 

corresponding to the eigenvalue B SB , it follows that there exists some 

o o 

A such that 


L(B ) “ ASB . 
o o 


Since > 0, it follows that A > 0, Then 


= i l(b„) . 


T —1 

and since B has norm one, it follows that A = L(B ) S 
o ‘ ' 0 


Hence, if B minimizes £, then 

0 * 


o 


L 


H(B ) 
o 


Suppose that A is an n x n matrix satisfying A S = I. For a 
1 X n vector C, let 

m-1 m 


La(C) - I I (CA6 )A6 

1=1 i=l+l ^ J 


and let 


Ha<C) = 


La(C) 


I|l,(c) I 


Theorem 2, Let A be an n x n matrix such that A S A^ = I. 


(a) If C is a fixed point of then B = 


CA 


CAl 


is a fixed point of H 
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(b) If B is a fixed point , of H, then C = [ |BA | ^BA ^ is a 
fixed point of 

Proof: 

T 

(a) If C - , we have | |l^(C)^| |cA = L^(C)’^A , 

fp rp ••I t 

and so j |l^(C) | [ | |ca| [ = [ |l^(C) a] j , We also note that E = A A and 
L(CA)’^ = (A"\^(C))’‘. Then 



L(cA)’^ 

I I L(CA)'^ Z~^ I I 

(A \iC)y 2 

I I (A"\(C))’’ I I 

L^(C)'^ (a''^)“Va 

II L^(C)'^ (a'^)"Va II 

L^CO'^A 

I L^(C)'^A I 



L^(C)’^A 


II I |CA 


CA 


B . 


CA 


(b) If H(B) = B, then \\ LCB)^ 2”^ a"^ 


LCB)*^ TT^ 


Letting C = 


BA 


-1 


BA 


-1 


, we have 


H^(C) = 


L^(C) 


II L.Cc) 


\||BA-^i|/ ■ 


/ BA' 


-1 


'""'llBA-^ll 




l^Cba"^)"^ 


(A 1(B)) 


(A L(B))^ I I 

T T — 1 
L(B) A^AA 

T T — 1 
L(B) .A AA 


L(B)^ A"^ 

L(B)’^ 


L(B)'^ A“^ 


L(B)^ Z~^ 


BA 


-1 


BA 

BA 


-1 


“ C 


8 . 


II l|BA"^|| . 
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In light of Theorem 2, the problem of minimizing f reduces to 
finding a fixed point of Thus we have the following procedure; 

, a. Given and 1 < i < m, compute E from 

(three different ways of computing Z are discussed in 
Section 3),. 

T 

b. Determine A such that A 2 A = I, 

. c. Using an initial guess C for the fixed point of H , compute 

O . .A 

successive vectors using the mean iteration formula 
(see [4],) 

c J.1 = -rr C + H. (C ) , 
n+1 n+1 n n+1 A n 

d. If the sequence {C } converges to C, then C = H.(C), and 

n A 

CA 

B ig x:he initial vector for the numerical optimization 

I |ca| I 

procedure used to minimize 

g(B) = 1 “ / max a p (y,B)dy , 

Jl Ki<m ^ ^ 

R ~ “ 

where the parameters for p^ are given by and Ki<m 

The procedure in [6] is the same as the above procedure with the 
functions L, H, L^, and replaced with the functions F, G, and G^, 
respectively, where 
m-1 

F(B) == I CL p ’(a ;B) <y -y. ) , 

j=l j J ^ j+1 j 

and the indices for the y^*s are chosen (for a given B) such that 



10 


< ».. < By^ , 
1 2 m 


a 


j 


ln(a /a. 

1+1 


B(y 


i ^ 

j+1 j 


B(y. +y. ) 
j+1 j 
2 



r _ F(B)'‘^E~^ 

||f(b)'^5:"^| I 

and F^, are the resulting expressions of F and G above when y^ = Ay^ 
and A 2 a"^ = I. 

At present there are no theoretical results which insure that the 
sequence {C^} above always converges. Investigations into this and related 
problems are underway. 



3, Prelltninary Numerical Results 

For all of the results presented herein we used as signatures the 
12-dimensional mean vectors and 12x12 covariance matrices for 
classes 1-9 of Flight Line 210, 

As possible candidates for the common covariance matrix Z, we 
investigated the following: 


(1) Z = j (Z^+...+Zg) 


(2) S = I 


9 15:^1 


i=l 


lo^.llZ,' 

i=l ^ 


9 a.tr(Z.) 

(3) Z » ^ = — Z- , tr(A) denotes the trace of A, 


i=l 


I a tr(Zj) 
i=l ^ ^ 


As initial guesses, C^, for the fixed points we used both 




and ' 
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The results in Tables 1 and 2 below assumed equal a priori probabilities 
(a^ = 1/9). An unequal a priori probability case is presented in Table 3. 

The following notation is used In the tables: 

— The initial vector determined by the particular 

starting procedure; that is, is the computed fixed 
point of either G or H. 

B . — The vector which minimizes g as determined by the 

min 

numerical optimization procedure when using B^ as an 
initial vector. 

g(B) — The value of the probability of misclassification at 

B for the general problem (distinct 2^) under consideration. 

As can be seen from Tables 1 and 2 below, the procedure developed in 

Section 2 produced the best results when 2 was computed using formula (2) 

and C = C . The best results for the procedure developed in [6] were 
o max 

obtained when E was computed using formula (3) and C C 

o max. 



Formula used 

satisfying 

B = H(B ) 

0 0 

satisfying 

B = G(B ) 

0 o 

to compute Z 

g(B^) 


g(Bo) 


(1) 

37.84 

29.20 

33.90 

22.51 

(2) 

38,77 

16.43 

36.16 

29.37 

(3) 

36.60 

29.20 

32.79 

16.43 


Table 1, C *C 

o max 


Formula used 

satisfying 

= H(B^) 

0 0 

B^ satisfying 

B = G(B ) 
o o 

to compute S 

g(Bo) ' 


g(Bo) 


(1) 

37.66 1 

1 

29.20 

29.82 

22.51 

(2) 

39.49 : 

1 

22.51 

31.32 

29.20 

(3) 

36.54 i 

1 

29.20 

31.26 

29.20 


Table 2. C =C' 

o min 








Formula used 

satisfying 

B = H(B ) 

0 0 

, satisfying 

B = G(B ) 

0 o 

to compute E 

g(B„) 


g(B^) 

1 ^^\±r? 

(2) 

B 


— 

j 

(3) , 

— 


26.59 

12.40 


^1 ~ ^2 ~ ^3 * ^4 "* 

= ttg - .15, a^ = .02, a^ = ,08 

Table 3. C »C 

o max 
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